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Abstract 

Farming was established in Central Europe by the Linearbandkeramik culture (LBK), a well- 
investigated archaeological horizon, which emerged in the Carpathian Basin, in today's Hungary. 
However, the genetic background of the LBK genesis has not been revealed yet. Here we present 
9 Y chromosomal and 84 mitochondrial DNA profiles from Mesolithic, Neolithic Starcevo and LBK 
sites (7 th /6 th millennium BC) from the Carpathian Basin and south-eastern Europe. We detect 
genetic continuity of both maternal and paternal elements during the initial spread of 
agriculture, and confirm the substantial genetic impact of early farming south-eastern European 
and Carpathian Basin cultures on Central European populations of the 6 th -4 th millennium BC. Our 
comprehensive Y chromosomal and mitochondrial DNA population genetic analyses demonstrate 
a clear affinity of the early farmers to the modem Near East and Caucasus, tracing the expansion 
from that region through south-eastern Europe and the Carpathian Basin into Central Europe. 
Our results also reveal contrasting patterns for male and female genetic diversity in the European 
Neolithic, suggesting patrilineal descent system and patrilocal residential rules among the early 
farmers. 

Author Summary 

We report an exceptional large Neolithic DNA dataset from the Carpathian Basin, which 
was the cradle of the first Central European farming culture, the so called Linearbandkeramik 
culture. We generated 9 Y chromosomal and 84 mitochondrial DNA profiles from Mesolithic and 
Neolithic specimens from western Hungary and Croatia, attributed to the hunter-gatherers, 
Starcevo and LBK cultures (7 th /6 th millennium BC). We observe genetic discontinuity between 
Mesolithic foragers and early farmers, and genetic continuity between farming populations of 
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the 6 th -4 th millennium BC across a vast territory of southeastern and Central Europe. Nine novel Y 
chromosome DNA profiles offer first insights into the Y chromosome diversity of the earliest 
European farmers, and further support the migration (demic diffusion) from the Near East into 
Central Europe along the Continental route of Neolithisation. The joint analyses of the two 
uniparental genetic systems let us conclude that men and women had a similar roles in the Early 
Neolithic migration process but their dispersal patterns were determined by sex-specific rules. 



4 



Downloaded from http://biorxiv.org/ on September 18, 2014 

Introduction 

Agriculture was first established in the Near Eastern Fertile Crescent after 10,000 BC and 
expanded from the Levant and Anatolia to south-eastern Europe [1]. Archaeological research has 
described the subsequent spread of Neolithic farming into (and throughout) Central and south- 
western Europe along two major and largely contemporaneous routes. On the Continental route, 
the Carpathian Basin connected south-eastern Europe to the Central European loess plains, while 
the Mediterranean route bridged the eastern and western Mediterranean coasts, introducing 
farming to the Iberian Peninsula in the far West [2-6]. 

On the Continental route, the Early Neolithic Starcevo culture (STA) has played a major 
role in the Neolithisation of south-eastern Europe. The STA expanded from present-day Serbia to 
the western part of the Carpathian Basin, encompassing the regions of today's northern Croatia 
and south-western Hungary (ca. 6,000- 5,400 BC) [7,8] (Figure 1), and resulting in the formation 
of the Linearbandkeramik culture (LBK) [9]. The earliest LBK emerged in the mid-6 th millennium 
BC in Transdanubia (called "LBK in Transdanubia", or LBKT) [9], marking the beginning of 
sedentary life in northern Hungary and via this area, Central Europe. The earliest LBKT coexisted 
with the STA in Transdanubia for about 100-150 years [10]. Archaeological research described an 
interaction zone between indigenous hunter-gatherer groups and farmers at the northernmost 
extent of the STA in Transdanubia, which might have led to the genesis of the LBKT [10,11]. After 
its formative phase in western Hungary, the LBK spread rapidly to Central Europe, reaching 
central Germany around 5,500 BC [2,12]. In the following 500 years, the LBK continually 
expanded, eventually covering a vast geographic area from the Paris Basin to Ukraine in its latest 
phase [2,13], and persisted in Transdanubia until ~4,900 BC (Figure 1). 
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Despite the well-established archaeological relations between the STA and LBKT in the 
Carpathian Basin and the LBK in Central Europe, their genetic relationship has hitherto been 
unknown. Traditionally, scholars have explained the Neolithic transition either as an expansion of 
early farmers from the Near East, who brought new ideas as well as new genes (demic diffusion) 
[14-17], or as an adoption of farming technologies by indigenous hunter-gatherer populations 
with little or no genetic influence (cultural diffusion) [18-21]. These two contrasting models have 
been merged into complex integrationist approaches, considering small-scale population 
movements on regional levels [1,2,10,22]. 

Inferences drawn from genetic studies based on present-day data have yielded 
contradictory results about the Neolithic impact on the genetic diversity of modern Europeans, 
showing a disparity between mitochondrial DNA (mtDNA) and Y chromosomal patterns. Several Y 
chromosome studies supported the Neolithic demic diffusion model [17,23,24], while most 
mtDNA and some Y chromosomal studies have proposed a continuity of Upper Palaeolithic 
lineages [20,21,25,26]. The contrasting mtDNA and Y chromosomal evidence has been explained 
by differences in evolutionary scenarios, such as sex-biased migration [27]. 

Recent ancient DNA (aDNA) studies have provided direct insights into the mtDNA and 
autosomal diversity of hunter-gatherers in Europe [28-33] and the Central European LBK [33- 
37], describing a clear genetic discontinuity between local foragers and early farmers [28,31,36]. 
Comparative analyses with present-day populations have revealed Near Eastern affinities of the 
mitochondrial LBK ancestry supporting the demic diffusion model and population replacement at 
the beginning of the Neolithic period [36,37]. Data on Y chromosomal diversity in Neolithic 
Europe is still scarce. Beside the recently described first Mesolithic and Neolithic hunter-gatherer 
Y chromosomal data [33,38], Y chromosome data have been reported from a few LBK samples 
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[36], the Tyrolean Iceman [39], the southwest European Neolithic [40,41], and from the Late 
Neolithic Central Germany [42,43]. 

The postulated Near Eastern origin of Central Europe's LBK farmers has so far only been 
inferred from modern-day population data. The first ancient mtDNA data from early Near 
Eastern farmers has been reported recently [44], however the genetic diversity in the vast 
territory from the Fertile Crescent to Central Europe has been largely unexplored. Consequently, 
our aims were to i) study the genetic diversity of the early farming Carpathian Basin cultures 
from both the mtDNA and Y chromosome perspectives, ii) examine whether men and women 
had different demographic histories, iii) investigate the contribution of the STA to the genetic 
variability of the LBKT and LBK, iv) reveal the potential genetic origins of the first farmers in 
Eurasia, and v) to assess the role of the Continental route in the European Neolithic dispersal. 

In this study we present 84 mtDNA and 9 Y chromosomal DNA data from Mesolithic 
(6,200-6,000 BC), and Neolithic specimens of the STA and LBKT from western Hungary and 
Croatia, spanning ~900 years (ca. 5,800-4,900 BC) of Neolithic period. The population genetic 
analysis allowed detailed insight into the role of archaeological cultures from the Carpathian 
Basin in the spread of farming from the Near East. 
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Results 

Mitochondrial DNA 

Using well established aDNA methods (Material and Methods), we genotyped mtDNA 
variability by sequencing the hyper-variable segment I and II (HVS-I/II) and 22 single nucleotide 
polymorphisms (SNPs) on the coding region of the mitochondrial genome [36]. Overall, we 
investigated 109 skeletons from one Mesolithic, six STA and eight LBKT sites from western 
Hungary and Croatia (Figure 1, Dataset S1-S2). We successfully genotyped endogenous HVS-I 
sequences of 84 individuals (hunter-gatherer=l, STA=44, and LBKT=39) yielding a success rate of 
76% (Dataset S3). We also sequenced parts of HVS-II from 25 individuals with consistent and 
identical HVS-I motifs in order to increase the phylogenetic resolution and to detect potential 
intra-site maternal kinship. The analysis of haplogroup defining coding region SNPs provided 
reproducible profiles for 96 individuals, with a success rate of 86% (Dataset S3-S4). 
The haplotype of the Mesolithic skeleton from the Croatian Island Korcula belongs to the mtDNA 
haplogroup U5b2a5 (Dataset S3). The sub-haplogroup U5b has been shown to be frequent in 
pre-Neolithic hunter-gatherer communities across Europe [28-30,32,33,45,46]. Contrary to the 
low mtDNA diversity reported from hunter-gatherers of Central/North Europe [28-30], we 
identify substantially higher variability in early farming communities of the Carpathian Basin 
including the haplogroups Nla, 11, T2, J, K, H, HV, V, W, X, U2, U3, U4, and U5a (Table 1). 
Previous studies have shown that haplogroups Nla, T2, J, K, HV, V, W and X are most 
characteristic for the Central European LBK and have described these haplogroups as the 
mitochondrial 'Neolithic package' that had reached Central Europe in the 6 th millennium BC 
[36,37]. Interestingly, most of these haplogroups show comparable frequencies between the 
STA, LBKT and LBK, comprising the majority of mtDNA variation in each culture (STA=86.36%, 
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LBKT=61.54%, LBK=79.63%). In contrast, hunter-gatherer haplogroups are rare in the STA and 
both LBK groups (Table 1). Besides similar haplogroup compositions we also found comparable 
haplotype diversity values for each culture (STA=0.97674, LBKT=0.95277, LBK=0.95483). 

In order to evaluate whether the haplogroup and haplotype composition of the STA, LBKT, 
LBK [34-37] and hunter-gatherers from Central/North Europe [28-30] differ significantly from 
each other, we performed a haplogroup-based Fisher's exact test and a sequence based genetic 
distance analysis. In addition, we used the test of population continuity (TPC) [37], to elucidate 
whether the observed differences can be best explained by genetic drift or by other factors such 
as migration. These analyses reveal that the mtDNA composition of the Early Neolithic cultures is 
significantly different from that of the hunter-gatherers, both on the haplogroup (p=0.0001) and 
haplotype level (F st = 0.17989-0.18810, p=0.0000) (Table 1), indicating genetic discontinuity of 
maternal elements at the advent of farming in the Carpathian Basin as it has been reported 
previously from Central Europe [28,36,37]. The TPC shows that independent of the tested 
effective population size, the transition from hunter-gathering to farming cannot be explained by 
genetic drift alone (p<0. 000001, Dataset Sll). More importantly, non-significant differences 
between the haplogroup (p=0.06829-0.5574) and haplotype composition (F st =-0.00518-0.01343, 
p=0. 21072-0. 60608) of the STA and the LBK groups from Transdanubia and Central Europe (Table 
1) support a rather homogenous mtDNA signature of early farming communities from both 
regions. The TPC also supports the scenario of population continuity during the Neolithic period, 
showing no significant p values among the pairwise compared Neolithic cultures (p>0.177 with 
all tested effective population sizes, Dataset Sll). 

We combined our Neolithic samples from the Carpathian Basin with 487 published 
mtDNA data from Upper Palaeolithic and Mesolithic [28-30,32,45,46], Neolithic [34- 
37,40,42,43,45-47] and Early Bronze Age [37] sites across Europe (Dataset S6) and conducted 
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principal component analysis (PCA), multidimensional scaling (MDS), analysis of molecular 
variance (AMOVA) and shared haplotype analysis to compare the mtDNA variability of the STA 
and LBKT in a broader geographical and chronological context (Material and Methods). 

PCA and MDS show that the mtDNA makeup of the STA and LBKT is strikingly similar to 
the LBK [34-37] and to subsequent cultures of the 5 th /4 th millennium BC in Central Europe [37] 
(Figure 2-S1, Dataset S7-S8). This is predominately based on a high number of 'Neolithic 
package' lineages and low frequencies of haplogroups attributed to hunter-gatherers, which 
clearly distinguish this cluster from hunter-gatherers of Central/North [28-30] and southwest 
Europe [32,45,46], but also from Neolithic Iberian populations and Central European cultures of 
the 3 rd /2 nd millennium BC (Figure 2). In order to exclude biases induced by potential maternal 
kinship within the prehistoric datasets, we performed PCA and MDS with a reduced dataset (*) as 
well, in which redundant haplotypes with identical HVS-I and II sequences from the same site 
were omitted. The reduced datasets have similar locations on the plots to the complete datasets, 
indicating that the effect of maternal kinship is negligible. 

We used AMOVA to evaluate whether the observed affinities of STA and LBKT with the 
LBK and 5 th /4 th millennium BC cultures from Central Europe are the result of a shared population 
structure. We pooled HVS-I sequences from the STA and LBKT and nine archaeological cultures 
from Central Europe ranging from the LBK to the Early Bronze Age [34-37,42,43] into different 
groups, and tested 82 different arrangements to identify the constellation with the highest 
among-group variance and simultaneously with low variation within the groups (Dataset S9). The 
highest among-group variance was observed when STA and LBKT were arranged in one group 
with the Central European LBK and with all 5 th /4 th millennium BC cultures, while the 3 rd /2 nd 
millennium BC cultures were separated in a second group (among-group variation=3.50%, 
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F st =0.03501, p=0.00396; within-group variation=0.20%, F st =0.00203, p=0.31139, Dataset S9). 
These results suggest a common genetic structure of the 6 th -4 th millennium BC cultures. 

We used shared haplotype analysis [48] and modified this approach by accounting for the 
temporal succession of cultures (ancestral shared haplotype analysis - ASHA). This enabled us to 
ascribe mtDNA lineages to particular cultures or time periods according to their first appearance 
in the dataset in chronological order (Figure 3, Dataset S10), and to estimate the amount of 
ancestral lineages in each culture, potentially derived from hunter-gatherers, STA, LBKT, LBK or 
other subsequent cultures. The ASHA shows that ancestral hunter-gatherer lineages were rare in 
the STA (2.27%), LBKT (0%) and LBK (1.85%) as well as in 5 th /4 th millennium BC cultures (0%) and 
became more common in Central Europe during the 3 rd /2 nd millennium BC (2.86-11.76%) [37]. In 
contrast, we identified a high degree of ancestral STA lineages in all subsequent cultures 
(LBKT=61.54%, LBK=55.56%, 5 th /4 th millennium BC=36.84-63.64%, 3 rd /2 nd millennium BC=36.17- 
43.18%). The subsequent LBKT reveals a smaller distinctive influence on its successors, since only 
12.96% of the LBK, 0-10.53% of the 5 th /4 th millennium BC, and 0-3.19% of the 3 rd /2 nd millennium 
BC cultures can be traced back to ancestral lineages first observed in the LBKT. The number of 
new 'ancestral' lineages is even lower in the LBK of Central Europe, with no effect on the 3 rd /2 nd 
millennium BC cultures. 

In order to identify affinities of our Neolithic datasets with present-day populations, we 
collated 67,996 published HVS-I sequences from Eurasian populations and conducted PCA and 
genetic distance mapping (Material and Methods). 

The PCA shows that the frequencies of Nla, Tl, T2, K, J and HV, and the absence of Asian 
and African lineages in the Carpathian Basin cultures cause a clustering of the STA with 
populations of the Near East and the Caucasus, while the LBKT falls between the latter and 
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populations from South and southeast Europe (Greeks, Bulgarians and Italians), which is caused 
by a higher frequency of haplogroup H in the LBKT (Figure S2, Dataset S12). However, the 
dominant frequencies of haplogroup Nla, T2, and K in the STA and LBKT result in a 
differentiation from all present-day populations along the third component. 

Sequence-based genetic distance maps are largely consistent with PCA and reveal the 
greatest similarities of the STA to populations of the Near East (Iraq, Syria) and the Caucasus 
(Azerbaijan, Georgia, Armenia), as well as some European populations, such as Italy, Austria, 
Romania, and Macedonia (Figure 3a, Dataset S13). The distance map of the LBKT displays 
affinities that are overall similar to the STA, which includes populations from Azerbaijan, Syria, 
and Iraq. We also observe similarities to present-day Europeans, such as the populations of Great 
Britain, Portugal, Romania, Crete, and Russia (Figure 3b, Dataset S13). These similarity peaks are 
likely explained by elevated frequencies of shared lineages due to shared genetic drift in modem- 
day populations. 

Y chromosomal DNA 

We also analysed the non-recombining part of the Y chromosome (NRY) in the 
investigated samples, using multiplex [36] and singleplex approaches, targeting 33 haplogroup 
defining SNPs. We successfully generated unambiguous NRY SNPs profiles for nine male 
individuals (STA=7, LBKT=2) (Dataset S3, S5). Three STA individuals belong to the NRY haplogroup 
F* (M89) and two specimens can be assigned to the G2a2b (S126) haplogroup, and one each to 
G2a (P15) and I2al (P37.2) (Dataset S3, S5). The two investigated LBKT samples carry 
haplogroups G2a2b (S126) and II (M253). Furthermore, the incomplete SNP profiles of eight 
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specimens potentially belong to the same haplogroups; STA: three G2a2b (S126), two G2a (P15), 
and one I (M170); LBKT: one G2a2b (S126) and one F* (M89) (Dataset S5). 

G2a2b and F* are rare in present-day Europe. Haplogroup G and its subgroups slightly 
increase towards the Near East and reach the highest frequency in populations of the south and 
northwest Caucasus [49,50], while haplogroup F* shows a diffuse dissemination pattern in 
Eurasia, which is based on insufficient sub-haplogroup resolution of most of the population 
genetic studies. Haplogroups II and I2al are most frequent in present-day populations of 
Europe, with the highest frequencies in Scandinavian [51-53] and southeast European 
populations respectively [51]. 

We used PCA and genetic distance maps to identify affinities of the Carpathian Basin 
samples with 49,516 NRY SNP profiles from present-day Eurasian and African populations 
(Material and Methods). Due to the similarities in Y chromosome composition and the small 
number of samples, we pooled STA and both LBK groups. 

The elevated haplogroup G frequency in populations of the west Caucasus results in a 
clustering with the STA-LBK group on the second principal component the predominant 
frequencies of haplogroups G and F* lead to a clear separation of the STA-LBK group from all 
present-day populations along the third principal component (Figure S4, Dataset S14). 
Similarly, the Y chromosome distance map discloses the greatest similarities to populations of 
the west and south Caucasus, such as Adyghe, Kabardin, Balkarians, Abkhazians, Azerbaijanis and 
Georgians as well as to the Sardinians (Figure 4, Dataset S15), which can be explained by the high 
frequency of haplogroup G/G2a [50,54] in these populations. This might reflect genetic drift, 
caused by isolation and small effective population size after a direct gene flow from the Near 
East, which lead to a fixation of this haplogroup [49]. Intriguingly, populations of the northeast 
Caucasus show greater distances to the STA-LBK samples due to lower abundance of haplogroup 
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G/G2a [50]. Recently, the genomic data of an LBK individual from Stuttgart has been shown to be 
similar to modern-day Sardinians [33], which result can be explained by the isolation of the 
Sardinians, leading to the conservation of the Neolithic genetic signature. Nevertheless, our 
mtDNA population genetic analyses did not assure the Neolithic-Sardinian affinity, detected only 
on the NRY genetic distance map. 

Discussion 

This study provides the first in-depth population survey of early farming cultures from the 
Carpathian Basin and south-eastern Europe and demonstrates their essential role in the genesis 
of the first farming communities of Central Europe. Our population genetic analyses (Fisher's 
exact test, PCA, MDS, AMOVA, TPC) reveal a similar haplogroup composition and comparable 
haplotype diversity between the mtDNA variability of the Carpathian Basin cultures and the LBK 
from Central Europe (Table 1), indicating a homogenous and shared population structure of early 
farming communities from both regions (Figure 2, SI). 

The ASHA shows that about 55% of the LBK lineages ascribed to characteristic 'Neolithic 
package' haplogroups could be traced back to the STA and LBK in Transdanubia (Figure 3, Dataset 
S10). It is therefore likely that this mtDNA signature was also present in ancient populations 
preceding the STA (7 th /6 th millennium BC farming groups from the Aegean and the southern 
Balkans), in accordance with the archaeological record, which suggests cultural links to regions 
further southeast [5]. Interestingly, the STA mtDNA signature was still preserved in Neolithic 
cultures of the 5 th /4 th millennium BC in Central Europe (Figure 3, Dataset S10), attesting a direct 
and enduring genetic legacy of the STA and LBKT in the Central European Neolithic, with minimal 
or no additional genetic influence from outside for the subsequent 2,500 years. 
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Importantly, our comparative analyses (PCA and genetic distance maps with modem 
population data) point out that both the mtDNA and NRY variability, observed in the Carpathian 
Basin samples, most likely originated in the Near East with connections to the Caucasus (Figure 4, 
S2-4), which is in accordance with previous mtDNA studies of the Central European LBK [36,37], 
and subsequent farming cultures of the 5 th /4 th millennium BC [37]. The continuation of lineages 
through space and time suggests a scenario in which the genetic makeup of early farmers 
originated in the Near Eastern Fertile Crescent, from where it spread to Central Europe via the 
western Carpathian Basin, a region which acted as a natural corridor and an adaptation zone 
during the Neolithic expansion. The shared Near Eastern affinities of the STA, LBKT and LBK, and 
the genetic continuity in the maternal and paternal gene pools are consistent with the 
archaeological record, which describes the genesis of the early LBK (LBKT) from STA 
communities, followed by a rapid dispersal of the early LBK culture from Transdanubia towards 
the north-western part of Central Europe [3,9,13]. Recent aDNA study from 8000 BC Near 
Eastern farmers raises the question whether modern Near Eastern mtDNA can be used as a 
proxy for the Near Eastern Neolithic variability [44]. In our opinion, these newly described seven 
different incomplete HVS-I haplotypes (np 16095-16369) only provide a limited basis for 
comparative aDNA analyses, and we thus still consider modern-day Near Eastern genetic data 
sufficient proxies, when tracing the origin of the first European farmers. Recent study using 
ancient genomic data of the 'Stuttgart' LBK individual, the Tyrolean Iceman, and a Scandinavian 
farmer (G6k4) has shown rather south European than Near Eastern affinity of these 'early 
farmers', and has estimated a western hunter-gatherer ancestry of 0-45% in the early farmers' 
gene pool [33]. These results do not contradict ours, since uniparental markers behave more 
conservative. They could preserve Near Eastern signature more consistently, even if admixture 
with foragers occurred on the way to Central Europe. Furthermore the results are not directly 

15 



Downloaded from http://biorxiv.org/ on September 18, 2014 

comparable with ours, since we had used earlier Neolithic specimens from a region that was 
nearer to the source region than it was the case in the study by Lazaridis et al. 

The very low frequencies of hunter-gatherer lineages (0-2.27%), in the STA, LBKT and LBK 
sample sets (Figure 3) indicate that the arrival of agriculture in the Carpathian Basin and Central 
Europe was accompanied by a strong reduction of the currently known Mesolithic mtDNA 
substratum, resulting in a distinct and contrasting mtDNA haplogroup composition and 
significant differences between European hunter-gatherers and the Early Neolithic cultures 
(Figure 2-3, SI, Table 1, Dataset S7-8, S10-11). This scenario is consistent with coalescent-based 
simulations that have revealed genetic discontinuity between Central European hunter-gatherers 
and LBK communities [28,36]. The detection of haplogroup U5b in the investigated Mesolithic 
skeleton from Croatia matches previous observations, which describe sub-haplogroups of U as 
most frequent in forager populations across Europe, forming a characteristic Mesolithic mtDNA 
genetic substratum [28,37]. Residual Neolithic hunter-gatherer isolates, as reported from Central 
Europe by Bollongino et al. [30], have not yet been observed in our study region. According to 
the low proportion of hunter-gatherer mtDNA lineages in the LBK gene pool, we assume, that 
admixture between hunter-gatherers and colonizing LBK farmers was negligible in Central 
Europe. Considering the relative size and speed of the LBK expansion, we have to assume a 
substantial population growth during the earliest LBKT, which might have resulted in a 
population pressure and led to emigration from Transdanubia [55]. While such a radical 
population increase was not palpable from the Early Neolithic archaeological records [7], but 
recent extensive archaeological excavations have provided new insights into large-scale early 
LBKT settlements in western Hungary [9,56,57], which suggest larger source communities for a 
possible colonization than previously assumed. 
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Y chromosomal population genetic studies of modern-day Europeans have proposed that 
II and I2al NRY haplogroups were present in Europe since the Late Upper Palaeolithic. This was 
based on consistently high divergence time estimates [51,58], suggesting an expansion from 
Franco-Cantabrian (II) and southeast European glacial refugia (I2al) after the Last Glacial 
Maximum [51]. I2al has been recently described in Mesolithic specimens from Loschbour 
(Luxemburg) and Motala (Sweden) [33], in a Scandinavian Neolithic hunter-gatherer from Ajvide 
(Sweden, 2,900-2,600 BC) [38], as well as in Neolithic remains of southern France and northern 
Spain [40,41]. From the Mesolithic Motola site a further three men could be assigned to the 
haplogroup I [33]. The fact that almost all Mesolithic males belong to haplogroup I suggests that 
this haplogroup might represent a pre-farming legacy of the NRY variation in Europe. 

Y chromosome haplogroups from STA and LBKT samples, such as haplogroups G2a2b and 
F* have also been reported from the Central European LBK [36], and support a close genetic 
relationship of the paternal lineages. Genetic studies on modern-day populations have discussed 
haplogroup G [25,59] and its subgroup G2a as potential representatives of the spread of farming 
from the Near East to Europe [26]. This scenario has recently been supported by Neolithic data 
from northern Spain [40] and southern France [41], which attested G2a a pivotal role in the 
Neolithic expansion on the Mediterranean route. Furthermore, G2a has also been reported from 
the Tyrolean Iceman (G2a2alb (L91)) [39]. Taken together, these findings suggest that sub- 
haplogroups of G2a were frequent in Neolithic populations of the 6 th -4 th millennia BC across 
Europe. Thus, if we take Y chromosomal haplogroup 12a (and possibly II) as proxy for a 
Mesolithic paternal genetic substratum in Europe, we observe a similar pattern to the 
changeover in the mitochondrial DNA variability, in which NRY G lineages dominate Neolithic 
populations across Europe and I lineages become rare [36,39-43]. 
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The most characteristic mtDNA haplogroup of early farmers from the Carpathian Basin 
and Central Europe is Nla. Nla has previously been discussed as a potential marker of the 
spread of farming [34]. The presence of Nla in early farmers from the Carpathian Basin (6.82- 
10.26%) and Central Europe (12.04%, Table 1) lends further support to its pivotal role as a 
marker for the Continental route of the Neolithic expansion. On the other hand, mtDNA Nla and 
NRY G2a haplogroups are rare in present-day European populations, which is also reflected in 
the separation of the 6 th millennium BC cultures from all present-day populations along the third 
principal component on the PCA plots (Figure S2, S4). These findings indicate further 
demographic events after the Early/Middle Neolithic period that shaped modern-day mtDNA and 
NRY variability. Recent evidence from ancient mtDNA has described the formation of modem- 
day variability by several successive migration events in Central Europe during the 3 rd /2 nd 
millennium BC [37]. It is highly likely that these events have also affected the NRY diversity. 
Surprisingly, Y chromosome haplogroups, such as Elblbl (M35), Elblblal (M78), Elblblb2a 
(M123), J2 (M172), Jl (M267), and Rlbla2 (M269), which were claimed to be associated with the 
Neolithic expansion [23-25], have not been found so far in the 6 th millennium BC of the 
Carpathian Basin and Central Europe. Intriguingly, Rla and Rib, which represent the most 
frequent European Y chromosome haplogroups today, have been reported from cultures that 
emerged in Central Europe during the 3 rd /2 nd millennium BC, while a basal R type has been 
reported from a Palaeolithic sample in Siberia [60] in agreement with a proposed Central 
Asian/Siberian origin of this lineage. In contrast, G2a has not been detected yet in late Neolithic 
cultures [42,43]. This suggests further demographic events in later Neolithic or post-Neolithic 
periods. However, we caution that the NRY record is still very small, especially in more recent 
periods, and further ancient Y data are required to shed light on the formation of the modern- 
day paternal diversity. 
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Interestingly, recent model-based statistical analyses of contemporary NRY and mtDNA 
data, testing a series of population scenarios for the Neolithic transition, have revealed a shared 
admixture history for men and women, but not the same demographic history [61]. This study 
has shown that female had a larger effective population size, likely based on differential effects 
of social and cultural practices including increasing sedentism alongside a shift to monogamy and 
patrilocality in early farmers. It is therefore important to interpret our new genetic data in the 
light of those findings. Considering the entire set of 32 published NRY records available for 
Neolithic Europe thus far, the low paternal diversity is indeed quite remarkable: G2a is the 
prevailing haplogroup in the Central European and Carpathian Basin Neolithic, and in French and 
Iberian Neolithic datasets [36,40,41]. There are only two exceptions, namely one Elblb (V13) 
[41] individual from the Avellaner cave in Spain (~5,000-4,500 BC), and two 12a [40] individuals 
from Treilles, France (~3,000 BC). This very limited variation in NRY haplogroups in contrast to 
the high mtDNA haplogroups diversity suggests a larger effective population size for females 
than males. One plausible explanation for this phenomenon is patrilocality (where women move 
to their husband's birth place after the marriage). Other possibilities that could lead to similar 
observations include polygyny or male-biased adult mortality. A patrilocal residential rule was 
possibly linked to a system of descent along the father's line (patrilineality) in early farming 
communities. Ethnographic studies have suggested a change of residential rules at the advent of 
Neolithisation, showing different trends in residential rules among modern foragers and 
nonforagers [62]. Increasing sedentism promotes territorial defence and control of resources, 
favouring men in the inheritance of land and property, which consequently led to patrilocal 
residence [62]. At the same time, such residence pattern have to be momentarily flexible in 
expanding populations, allowing some of the sons to settle in new territories following 
population pressure and natural limitation of resources, e.g. after the carrying capacity of a 
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particular region has been reached [61]. Patrilocality has also been raised in recent 
bioarchaeological studies. It has been suggested by aDNA evidence for the Treilles Neolithic 
community [41], and by stable isotope studies for the LBK in Central Europe [63]. 

It is important to note that patrilocality does not contradict the demic diffusion model, and 
it appears that both phenomena have left a discernible mark on the European Neolithic genetic 
diversity. While patrilocality and -lineality might have caused high mtDNA and low NRY within 
population diversity, the demic diffusion model best explains the mtDNA and NRY affinity of the 
early farmers to the modern Near East and Caucasus, and the observed global genetic 
homogeneity on a vast territory of south-eastern and Central Europe. Importantly, local 
processes of sex-biased migration are unlikely to have an effect on genetic variation at broader 
spatial scales. Our observations from many sites in Europe therefore argue for a common set of 
cultural and social practices across larger distances for early farming cultures in Europe. 
However, we caution that the observed differences in genetic diversity between males and 
females could also be influenced by resolution biases, resulting from the different sets of studied 
mtDNA and NRY markers. Examining sex-specific dynamics of early farmers is an important area 
that warrants further detailed research in order to address underlying parameters such as 
migration rate, level of exogamy and distances of marriage related dispersals among others. 

The novel 83 mtDNA and nine NRY data from early farming Neolithic populations of the 
Carpathian Basin and one Mesolithic mtDNA profile help to fill the geographic gap on the 
Continental route of the Neolithic expansion from the Near Eastern Fertile Crescent to Central 
Europe. The joint analyses of mitochondrial and Y chromosomal DNA data support the demic 
diffusion of the early farmer men and women through western Hungary, and demonstrate the 
paramount importance of this region as a prehistoric corridor of the migration. We point out that 
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archaeological cultures of the Carpathian Basin provided the genetic basis of the first Central 
European farmers that affected subsequent prehistoric cultures for a long period of time. 
Additionally, the new NRY data complement the sporadic European Y chromosomal dataset, and 
lend further support to patrilocal residential rules and patrilineal social system of the first 
farmers, underlining the role of demographic factors, which, depending strongly on cultural 
practices, notably shaped prehistoric and extant genetic diversity. 
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Material and methods 

We sampled one Mesolithic, 47 Starcevo and 61 LBKT skeletons, excavated in Croatia and 
western Hungary. The ancient DNA work was carried out in the Institute of Anthropology at the 
Johannes Gutenberg University of Mainz, following well-established protocols [34,36,37,42]. For 
minor modifications in the procedure of HVS-I, II and coding region SNP typing of the 
mitochondrial genome and SNP typing of the Y chromosome see Supplementary Information and 
Dataset S2-4. 

The achieved genetic results were evaluated by population genetic analyses, using 
comparative ancient and modern DNA datasets (see Dataset S6, S12-15). We performed Fst and 
AMOVA analyses in Arlequin 3.5.1 [48]. Furthermore, we conducted Fisher's exact test, PCA and 
MDS in R software environment. PCAs were based on mtDNA and Y chromosome haplogroup 
frequencies (Dataset S7, S12, S14). MDS was based on Slatkin linearized Fst values, calculated 
from mitochondrial HVS-I sequences (Dataset S8). Haplotype diversity was computed in DnaSP 
software, in version 5.10.01 [64]. 

MtDNA HVS-I sequence data and Y chromosomal haplogroup frequencies were applied 
for the genetic distance calculation, comparing the STA, LBKT mitochondrial DNA and a combined 
STA-LBK Y chromosomal datasets with 130 and 100 modern populations respectively (Dataset 
S13, S15). Genetic distance maps from the Fst values were generated in ArcGis version 10.0. 

In the ASHA that is a modified approach of the shared haplotype analysis [48], each HVS-I 
lineage within a given cultural dataset was traced back to its earliest appearance in a defined 
chronological order of the studied cultures. Each was regarded either as ancestral or as a new 
lineage, receiving its name after the culture where it was detected earliest in time (Dataset S10, 
Figure 3). 
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The adjusted parameters of the TPC [37] (Dataset Sll) as well as further points of each 
analysis are detailed in the Supplementary Information. 
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Glossary 



aDNA Ancient DNA 

AMOVA Analysis of molecular variance 

ASHA Ancestral shared haplotype analysis 

HVS l/ll Hyper Variable Segment I or II of the mitochondrial genome 

LBK Linearbandkeramik or Linear Pottery culture in Central Europe (refer to published 

LBK data from the Czech Republic, Lower Austria, and Germany) 

LBKT Linearbandkeramik or Linear Pottery culture in western Hungary/Transdanubia 

MDS Multidimensional scaling 

mtDNA Mitochondrial DNA 

np Nucleotide position 

NRY Non-recombining part of the Y chromosome 

PCA Principal component analysis 

SNP Single nucleotide polymorphism 

STA Starcevo culture 

TPC Test of population continuity 
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Figures in the main text 




Starcevo culture 
(6000-5450 BC) 

1 Alsbnyek-Bataszek, Mernoki t. 

2 Lanycsok-Csata-alja 

3 Lanycsok-Gata-Csotola 
Vlnkovci-Nama 

5 Vinkovci-Jugobanka 

6 Vukovar-Gimnazija 

LBK~in-Transdanubia 
(5600-4900 BC) 

7 Balatonszarszb-Kis-erdei-dulo 

8 Balatonszemes-Bagodomb 

9 BOIcske-Gyurusvolgy 

10 Budakeszi-Szoldskert-T. 

1 1 Harta-Gatorhaz 

12 Kony II Proletar-duld 85. 

13 Szemely-Hegyes 

14 Tolna-Mczs 



I n~J, 
O LBK culture (5500-4800 BC) 

1 5 Asparn Schletz 2 

1 6 Vedrovice 

17 Fiomborn 

18 Schwetzingen 

19 Vaihingen 

20 Seehausen 

21 Derenburg, Merenstieg II 

22 Halberetadt, Sonntagsfeld 

23 Oberwiederstedt I 
-ti24 Eilsleben 

25 Karsdorf 

26 Naumburg 



.1 



Figure 1. Geographic distribution of the Starcevo, LBK cultures and locations of the studied sites. 

The shaded areas of the maps show the distribution of the Starcevo culture (STA) and the LBK in 
Transdanubia (LBKT) and Central Europe (LBK) [2,3,5,7,9]. The arrows show the direction of the 
farmers' expansion into Central Europe, suggested by the above cited archaeological records. Coloured 
points indicate the studied sites of the STA (dark red) and LBKT (red) in western Hungary and northern 
Croatia. Central European LBK sites (brown) that were included in the comparative analyses are presented 
as well. 
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Figure 2. PCA plot comparing mtDNA data of 16 prehistoric cultures. 

PCA is based on the frequencies of 22 mtDNA haplogroups in the STA, LBKT and 14 prehistoric cultures. 
Color shadings and symbols denote cultures of different periods or European regions: hunter-gatherers 
(grey triangles), 6 th millennium BC Carpathian Basin cultures (red stars), LBK and 5 th /4 th millennium BC 
cultures in Central Europe (rose circles), Central European 3 rd /2 nd millennium BC cultures (blue circles), 
6 th /5 th millennium BC cultures of the Iberian Peninsula (green hexagons). The reduced version of each 
dataset is marked by an asterisk (*). The contribution of each haplogroup is superimposed as grey 
component loading vector. The first component (24.1%) clearly separates the hunter-gather populations 
and the Iberian cultures from the cluster of STA, LBKT, LBK and subsequent Central European 5 th /4 th 
millennium BC cultures, while Central European cultures from the 3 rd /2 nd millennium BC are differentiated 
along the second component (16.6%). Consequently, the PCA shows the maternal affiliation of the 
Carpathian Basin cultures to the LBK and to the 5 th /4 th millennium BC cultures in Central Europe. Detailed 
information about the comparative data and haplogroups frequencies are listed in table S7. 
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Culture abbreviations: hunter-gatherers in Central and North Europe (HGCN), hunter-gatherers in 
southwestern Europe (HGSW), Starcevo culture (STA), LBK in Transdanubia (LBKT), LBK in Central Europe 
(LBK), Rbssen culture (RSC), Schbningen group (SCG), Baalberge culture (BAC), Salzmunde culture (SMC), 
Bernburg culture (BEC), Corded Ware culture (CWC), Bell Beaker culture (BBC), Unetice culture (UC), 
Cardial and Epicardial culture (CAR), Neolithic Basque Country and Navarre (NBQ), Neolithic Portugal 
(NPO). 



HGCN ■ STA ■ LBKT □ LBK U other (RSC, SCG, BAC, SMC, BEC, CWC, BBC, UC) 
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Figure 3. Results of the ancestral shared haplotype analysis. 

The bar plot shows the proportions of ancestral mtDNA lineages associated to hunter-gatherer (grey), STA 
(dark red), LBKT (red), LBK (brown), and other subsequent cultures (white) in the Central/North European 
hunter-gatherers dataset, the two Carpathian Basin cultures and nine Central European cultures ranging 
from the LBK to the Early Bronze Age. The mtDNA variability of the early Neolithic STA from western 
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Carpathian Basin had a major influence on the mitochondrial gene pool of the Central European Neolithic 
cultures, which lasted from the LBK to the Early Bronze Age at least. Details of the ASHA are provided in 
table S10. Culture abbreviations: Hunter-gatherers in Central and North Europe (HGCN), Starcevo culture 
(STA), LBK in Transdanubia (LBKT), LBK in Central Europe (LBK), Rbssen culture (RSC), Schbningen group 
(SCG), Baalberge culture (BAC), Salzmunde culture (SMC), Bernburg culture (BEC), Corded Ware culture 
(CWC), Bell Beaker culture (BBC), Unetice culture (UC). 




Figure 4. Genetic distance map of the STA-LBK Y chromosomal data. 

Y chromosomal genetic distances (F s t) were computed between the STA-LBK samples and 100 present-day 
populations of Eurasia and North Africa and visualized on a geographic map. Grey dots denote the 
location of present-day populations. Color shadings indicate the degree of similarity or dissimilarity of 
Neolithic samples to the modern-day populations. Short distances and great similarities to present-day 
populations are marked by red areas. F s t values were scaled by an interval range of 0.01. F s t values higher 
than 0.21 were not differentiated (grey areas). The map shows remarkable affinities of the STA-LBK 
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samples to present-day populations of the northwest and south Caucasus. Population information and F st 
values are listed in table S15. 
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Table in the main text 



Hunter- 
gatherers in 
Central and 



Linearbandkeramik Linearbandkeramik 



culture in 



Culture name 
Culture abbreviation 
n 

mtDNA haplogroup 
frequencies H 



HV 
V 
J 
K 

Nla 

Tl 

T2 

U 

U2 

U3 

U4 

U5a 

U5b 

W 



North Europe Starcevo culture Transdanubia 



HGCN 
19 

0 
0 
0 
0 
0 
0 
0 
0 

10.53 

5.26 

0 

10.53 

21.05 

52.63 

0 

0 



STA 
44 

6.82 

2.27 

6.82 

11.36 

27.27 

6.82 

2.27 

20.45 

0 

0 

2.27 
2.27 
0 
0 

4.55 
6.82 



LBKT 

39 

30.77 

2.56 

2.56 

7.69 

12.82 

10.26 

2.56 

25.64 

0 

2.56 

0 

0 

2.56 
0 
0 
0 



culture in Central 

Europe 

LBK 

108 



16.67 

4.63 

4.63 

12.04 

20.37 

12.04 

0.00 

22.22 

0 

0 

0.93 
0 

1.85 
0.93 
2.78 
0.93 



Fisher test p values HGCN 
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STA 



0.00010 



Fst values 



Fst p values / 
adjusted p values 



LBKT 

LBK 

HGCN 

STA 

LBKT 

LBK 

HGCN 
STA 
LBKT 
LBK 



0.00010 



0.00010 



0.18810 



0.18489 



0.17989 



References 



0.00000 
0.00000 
0.00000 
28-30, this 
study 



0.06829 
0.26550 



0.55740 



0.01343 
0.00142 



-0.00518 



0.00000+-0.0000 0.00000+-0.0000 0.00000+-0.0000 



0.14048+-0.0033 0.33858+-0.0047 



0.21072 



0.40630 



this study 



0.60608 



this study 



0.60608+-0.0046 
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Table 1. Mt haplogroup frequencies, Fisher's exact test and genetic distances of four prehistoric 
metapopulations and archaeological cultures. 

Fisher's exact test was based on mtDNA haplogroup frequencies. Genetic distances or Fst values 
(italicized) were calculated from HVS-I sequences (np 16056-16400). Genetic distance p values were post 
hoc adjusted to correct for multiple comparison by Benjamin and Hochberg method (italicized). Culture 
and population information are presented in Dataset S6. 
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